Audio visual media encoding system

ABSTRACT

The present invention relates to a method, system and apparatus for encoding audio visual media signals sourced from preferably a video conference transmission. The technology provided is adapted to receive a video conference transmission from a computer network where this video conference transmission includes at least one audio visual signal and at least one protocol signal. One or more protocol signals are then read from the transmission received with the technology provided applying a selected encoding process to a received audio visual signal, wherein the encoding process selected depends on the contents of a read protocol signal.

RELATED APPLICATIONS

The present application corresponds to International Application No.PCT/NZ2003/000187 filed Aug. 21, 2003, which is based on, and claimspriority from, New Zealand Application Serial No. 520986, filed Aug. 23,2002, the disclosure of which is hereby incorporated by reference hereinin its entirety.

TECHNICAL FIELD

This invention relates to an Audio visual Media Encoding System.Preferably, the present invention may be adapted to encodevideoconferences, seminars or presentations made over a computer networkfor review by an observer, either in real time or at a later time.Reference throughout this specification will also be made to the presentinvention being used in this situation, but those skilled in the artshould appreciate that other applications are also envisioned andreference to the above only throughout this specification should in noway be seen as limiting.

BACKGROUND ART

Video conferencing systems have been developed which allow two-way audioand video communications between participants at remote locations.Participants may, through a common digital transmission network,participate in a real time videoconference with the assistance ofcameras, microphones and appropriate hardware and software connected tothe computer network used. Videoconferences can be used to presentseminars or other types of presentations where additional media such asslides or documents may also be supplied to a further input system ordocument camera for integration into the video or data stream sent.

As the participants of videoconferences interact in real time with oneanother, this places a high demand on network bandwidth with thetransmission of audio visual content signals. Furthermore, there can besome quality problems with the audio visual content of the conference ifthe network employed does not have sufficient bandwidth required to runthe conference correctly. In such instances the internet protocolpackets which make up the stream of signals between participants can belost or late arriving to a receiver and hence cannot be integratedeffectively in real time into the video and audio played out.

In some instances it is also preferable to supply or stream these videoconferencing signals to additional observers who cannot necessarilyparticipate in the conference. These observers may, for example, beinterested in a seminar or presentation made but may not necessarilyneed to, or be able to, attend or participate in the conference in realtime. Additional observers may view a stream of audio visual signals inreal time as the conference occurs, or alternatively can view thisinformation at a later time as their participation within the conferenceis not required. This stream may also be made available to conferenceparticipants at a later time.

To stream videoconference content to additional observers the signalsgenerated are normally supplied to an additional encoding computersystem. Using current technology such a computer is supplied with ananalogue feed of the video and audio signals sourced fromvideoconference unit cameras and microphones, which subsequentlyconverts, encodes or formats this information into a digital computersystem file which can be played by specific software playerapplications. The actual encoding or formatting applied will depend onthe player application which is to subsequently play or display theencoded videoconference. As can be appreciated by those skilled in theart, this encoded information may be streamed or transmitted out toobservers in real time, or alternatively may be stored for latertransmission to observers.

However, this approach used to encode videoconference content foradditional observers suffers from a number of problems.

In the first instance there are losses in accuracy or quality in theresulting formatted output due to the conversion of digital audio andvideo information to an analogue format for subsequent supply to theencoding computer system. In turn the computer system employed convertsthese analogue signals back into digital format, resulting in qualityand accuracy losses with each conversion made.

Furthermore, the encoding computer used must be provided with ananalogue cable connection to the video conferencing equipment andthereby in most instances must also be located within a room in whichone end point of the videoconference is to take place. This requires afurther piece of apparatus to be located within the video conferencingroom or suite, which must also be set up and configured prior to theconference in addition to the video conferencing equipment itself.

One attempt to address these issues has been made through use of videoconferencing transmission protocol, being ITU H.323 entitled“Packet-Based Multi-Media Communication System”. This protocol allowsaudio visual signals and associated protocol information to betransmitted to a network address from the video conferencing equipmentemployed—without this network address acting as a full participant tothe videoconference call taking place. The additional connection can bedescribed as a streaming end point for the videoconference signals whichcan be supplied to the digital audio and visual information required,without the necessary digital to analogue to digital conversionsrequired using existing technology.

However, a major complication with the use of this basic protocol arisesfrom the high bandwidth requirements employed in the video conferencingcall, and a subsequent streaming of signals to the end point at high bitrates. When re-transmitted to software player applications, the higherbit rate of the supplied input will be present in the output produced,thereby resulting in a large video file or high bandwidth requirements,which cannot readily be accessed by low speed connections to thecomputer network employed.

An improved audio visual media encoding system which addressed any orall of the above problems would be of advantage. A system would couldact as an end point for conference calls and could encode or formataudio and videoconference content for subsequent streaming or supply toobservers across multiple bitrates would be of advantage. A system whichcould exhibit and provide flexibility and functionality regarding howthese video and audio signals are encoded and supplied to observerswould be of advantage.

All references, including any patents or patent applications cited inthis specification are hereby incorporated by reference. No admission ismade that any reference constitutes prior art. The discussion of thereferences states what their authors assert, and the applicants reservethe right to challenge the accuracy and pertinency of the citeddocuments. It will be clearly understood that, although a number ofprior art publications are referred to herein, this reference does notconstitute an admission that any of these documents form part of thecommon general knowledge in the art, in New Zealand or in any othercountry.

It is acknowledged that the term ‘comprise’ may, under varyingjurisdictions, be attributed with either an exclusive or an inclusivemeaning. For the purpose of this specification, and unless otherwisenoted, the term ‘comprise’ shall have an inclusive meaning—i.e. that itwill be taken to mean an inclusion of not only the listed components itdirectly references, but also other non-specified components orelements. This rationale will also be used when the term ‘comprised’ or‘comprising’ is used in relation to one or more steps in a method orprocess.

It is an object of the present invention to address the foregoingproblems or at least to provide the public with a useful choice.

Further aspects and advantages of the present invention will becomeapparent from the ensuing description which is given by way of exampleonly.

DISCLOSURE OF INVENTION

According to one aspect of the present invention there is provided amethod of encoding audio visual media signals, characterised by thesteps of;

-   -   (i) receiving a videoconference transmission from a computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals, and    -   (iii) applying a selected encoding process to a received audio        visual signal, said encoding process being selected depending on        the contents of said at least one protocol signal read.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals further characterised bythe additional subsequent step of

-   -   (iv) producing encoded output for a software player application.

According to yet another aspect of the present invention there isprovided a method of encoding audio visual media signals substantiallyas described above, wherein the contents of said at least one readprotocol signal is used to detect the time position of at least onekeyframe present within an audio visual signal of the videoconferencetransmission.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals substantially asdescribed above, wherein the contents of said at least one read protocolsignal indicates a content switch present within an audio visual signalof the videoconference transmission.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals substantially asdescribed above, wherein the encoding process selected associates atleast one index marker with the encoded output when a content switch isdetected using said at least one read protocol signal.

According to another aspect of the present invention there is provided amethod of encoding substantially as described above wherein indexmarkers are associated with the encoded output at the same time positionat which a content switch is detected within an audio visual signal ofthe videoconference transmission.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals substantially asdescribed above, wherein a read protocol signal provides informationregarding any combination of the following parameters associated with anaudio visual signal of the videoconference transmission;

-   -   (i) audio codec employed and/or    -   (ii) video codec employed and/or    -   (iii) the bit rate of audio information supplied and/or    -   (iv) the bit rate of video information supplied and/or    -   (v) the video information frame rate and/or    -   (vi) the video information resolution.

The present invention is preferably adapted to provide a system andmethod for encoding audio visual media signals. Preferably these signalsmay be sourced or supplied from a videoconference transmission, with thepresent invention being adapted to encode at least a portion of thesesignals into a format which can be played to other users or observerswho are not directly participating in the videoconference. Referencethroughout this specification will also be made to video conferencesbeing transmitted using computer networks which should of course beconsidered by those skilled in the art to encompass any form of digitaltransmission network infrastructure or system.

Preferably the present invention may be used to implement an encodingprocess to be run in a computer system which can execute the method ormethods of encoding as described herein. Furthermore, the presentinvention may also encompass apparatus used to perform such methods ofencoding, preferably being formed from a computer system loaded withcomputer software adapted to execute or implement the present invention.The present invention may be adapted to produce an encoded output whichcan be played, displayed or otherwise relayed to further users withoutthese new users necessarily needing to participate in thevideoconference involved nor needing to view the encoded output at thesame time at which the videoconference takes place.

Preferably apparatus used in conjunction with the present invention toprovide the encoding process required may be used to take part directlyin the videoconference involved, and in some instances, can beconsidered as a videoconference end point. The apparatus or equipmentused to provide such an end point may in turn transcode or re-encode atleast one audio visual signal received in conjunction with thevideoconference to provide a transcoded audio visual output inconjunction with the present invention. The encoded output produced maybe stored to a computer file, or alternatively may be transmitted orstreamed to other users once encoded if required.

Preferably, the present invention may be adapted to provide an encodedoutput file, signal or transmission, which can be received or played bya computer based software player application to display audio visualmedia or content. The encoded output provided using the presentinvention may, in some instances be streamed or transmitted tonon-participating observers of a videoconference in real time as thevideoconference occurs. Alternatively, in other instances, the encodedoutput provided may be saved to a computer file or files which in turncan be downloaded or transmitted to non-participating observers to beplayed at a later time.

For example, in some instances the present invention may be adapted toprovide an encoded audio visual content output which can be played withMicrosoft's Windows Media Player™, Apple's Quicktime Player™ or RealNetwork's RealPlayer™. Furthermore, the players involved may alsosupport the reception of real time streaming of the encoded output toobservers as the videoconference involved occurs.

Reference throughout this specification will also be made to the presentinvention provided encoded output to be played on or by a computer usinga computer based software player applications. However, those skilled inthe art should appreciate that references to computers throughout thisspecification should be given the broadest possible interpretation toinclude any form or programmed or programmable logic device. Stand alonepersonal computers, personal digital assistants, cellphones, gamingconsoles and the like may all be encompassed within such a definition ofa computer and in turn may all be provided with software adapted to playthe encoded output provided in accordance with the present invention.Those skilled in the art should appreciate that reference to computersand computer software applications should not in isolation be consideredto references to personal computers only.

In a further preferred embodiment the encoded output provided may beadapted to be transmitted or distributed over a digital transmissionnetwork. This formatting of the encoded output provided allows same tobe distributed easily and quickly to a wide range and number ofgeographically disbursed users if required. Reference throughout thisspecification will also be made to transmissions of encoded output beingmade over computer networks. However, those skilled in the art shouldappreciate that any type of transmission network, system orinfrastructure which allowed for the transmission of digital signals ordigital content may be employed in conjunction with the presentinvention if required.

Reference throughout this specification will also be made to the encodedoutput provided being adapted to provide an input for a software basedplayer application for a computer system. However, those skilled in theart should appreciate that other formats or forms of encoded output mayalso be produced in conjunction with the present invention and referenceto the above only throughout this specification should in no way be seenas limiting. For example, in other embodiments the present invention mayprovide an encoded output which can be played using a cellular phone,PDA's, game consoles or other similar types of equipment.

Preferably, the videoconference transmissions made may be transmittedthrough use of a computer network. Computer networks are well-known inthe art and can take advantage of existing transmission protocols suchas TCP/IP to deliver packets of information to participants in thevideoconference.

In a preferred embodiment, the videoconference transmissions received inconjunction with the present invention may be supplied through acomputer network as discussed above. Receiving and encoding hardwareemployed in conjunction with the present invention may be connected tosuch a computer network and assigned a particular network or IP addressto which these videoconference transmissions may be delivered.

Those skilled in the art should appreciate that reference to computernetworks throughout this specification may encompass both networksprovided through dedicated ethernet cabling, wireless radio networks,and also distributed networks which employ telecommunications systems.

In a further preferred embodiment, hardware or apparatus employed by thepresent invention may be described as a streaming or streamed end pointfor the videoconference call involved. A streaming end point may act asa participant to the videoconference without necessarily supplying anyusable content to the videoconference call. This end point of aparticular address in the computer network may therefore receive all thetransmissions associated with a particular videoconference withoutnecessarily contributing usable content to the conference. Those skilledin the art should appreciate that end points as referred to throughoutthe specification may encompass any apparatus or components used toachieve same, which have also previously been referred to as‘terminals’, ‘gateways’ or ‘multi-point control units’, for example.

The present invention preferably provides both a method and apparatus orsystem for encoding audio visual media. The system or apparatus employedmay be formed from or constitute a computer system loaded with (andadapted to execute) appropriate encoding software. Such software throughexecution on the computer system through the computer system'sconnections to a computer network) can implement the method of encodingdiscussed with respect to the present invention. Furthermore, thiscomputer system may also be adapted to store computer files generated asan encoded output of the method described, or retransmit the encodedoutput provided to further observers in real time.

Reference throughout this specification will also be made to the presentinvention employing or encompassing an encoding computer systemconnected to a computer network which is adapted to receivevideoconference transmissions and to encode same using appropriatesoftware.

For example, in one instance the present invention may take advantage ofthe H323 protocol for videoconference transmissions made over a computernetwork. This protocol may be used to supply digital signals directly toan encoding computer system without any digital to analogue to digitalconversions of signals required.

Reference throughout this specification will also be made to the presentinvention being used to encode audio visual media sourced from avideoconference transmission made over a computer network. However,those skilled in the art should appreciate that other applications areenvisioned for the present invention and reference to the above onlythroughout this specification should in no way be seen as limiting. Forexample, the present invention may be used to encode other forms ofstreamed or real time audio visual transmissions which need notnecessarily be videoconference based, nor directly related totransmissions over computer networks.

Preferably, the videoconference transmissions received by the encodingcomputer may be composed of or include at least one audio visual signalor signals and at least one protocol signal or signals.

Preferably, an audio visual signal may carry information relating toaudio and/or video content of a videoconference as it occurs in realtime. A single signal may be provided which carries both the audio andvisual content of the conference as it is played out over time in someinstances. However, in alternative situations a separate signal may beprovided for both the audio and the video components of such conferencesrequired.

Preferably, the videoconference transmissions received also incorporatesor includes at least one protocol signal or signals. A protocol signalmay carry information relating to the formatting or make up of an audiovisual signal, including parameters associated with how such a signalwas generated, as well as information relating to the configuration,status, or state of the physical hardware used to generate such asignal. Furthermore, a protocol signal may also provide indications withregard to when the content displayed changes or switches using feedbackor information from the particular hardware used to generate an audiovisual signal. In addition, a protocol signal may also provideinformation regarding how a transmitted audio visual signal was createdsuch as, for example, whether a data compression scheme was used in thegeneration of the signal, and also may provide some basic informationregarding how such a compression scheme operated.

Preferably, the present invention may be adapted to initially read atleast one protocol signal received in conjunction with an audio visualsignal making up the videoconference transmission. The particularinformation encoded into such a protocol signal or signals can then beused to make specific decisions or determinations regarding how theincoming audio visual signal should in turn be encoded or formatted forsupply to further observers. The information harvested from a protocolsignal can be used to select and subsequently apply a specific encodingprocess or algorithm to produce the encoded output required of thepresent invention. The exact form of the information obtained from theprotocol signal and the encoding processes available and of interest toan operator of the present invention will determine which encodingprocess is selected and applied.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals characterised by thesteps of:

-   -   (i) receiving a videoconference transmission from the computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals, and    -   (iii) determining the time position of a keyframe present within        an audio visual signal received, and    -   (iv) encoding a keyframe into the encoded output at the same        time position at which the keyframe was detected and the        original received audio visual signal.

In a preferred embodiment, information obtained from a protocol signalmay include or indicate the time position or location of keyframespresent within the audio visual signal or signals received.

Keyframes are generated and used in digital video compression processes,and provide the equivalent of a full traditional video frame ofinformation. In addition to keyframes, pixel modification instructionscan be transmitted as the second portion of the video informationinvolved. A keyframe (which incorporates a significant amount of data)can be taken and then further information regarding the change inposition of objects within the original keyframe can then be sent overtime, thereby reducing the amount of data which needs to be transmittedas part of an audio visual signal.

This approach to video compression does however approximate the actualframes which composed the original video signal, as whole originalframes (the keyframes) are only transmitted or incorporatedoccasionally. If a previously compressed video signal is subsequentlyre-encoded or ‘transcoded’, these keyframes may be lost or a newkeyframe may be selected which was not originally a keyframe in thestarting compressed video. This can degrade the quality or accuracy ofthe resulting re-encoded or re-formatted video signal.

However, if in conjunction with the present invention, the time positionof each of the keyframes employed can be extracted or detected fromprotocol information. This allows the same keyframes to then be re-usedin the re-encoding or re-formatting of the video content of the audiovisual signal while minimising any subsequent loss of quality orintroduction of further inaccuracies. In such instances, keyframes areencoded into the encoded output at the same time as keyframes aredetected in an audio visual signal of the videoconference transmissioninvolved.

According to another aspect of the present invention there is provided amethod of encoding audio visual media signals characterised by the stepsof:

-   -   (i) receiving a videoconference transmission from a computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals to determine the        encoding characteristics of the received videoconference        transmission, and    -   (iii) receiving encoding preferences from at least one user, and    -   (iv) selecting from a set of encoding processes a subset of        encoding processes which can be implemented using the user's        encoding preferences and the encoding characteristics, and    -   (v) displaying the subset of encoding processes to a user.

In a preferred embodiment, the present invention may also provide theuser interface facility which allows a user or operator to set up howthey would prefer incoming audio visual signals to be encoded orformatted. An operator may supply encoding preferences or inputinformation with such a user interface, which can in turn be used totailor the characteristics of the encoded output produced.

In a further preferred embodiment, information or parameters regardingthe characteristics of an incoming audio visual signal may also beextracted from one or more protocol signals. These encodingcharacteristics of the received videoconference transmission may be usedin conjunction with information supplied by a user to determine apotential encoding scheme or schemes to be selected in a particularinstance.

In a preferred embodiment the received encoding characteristics andencoding preferences may be used to select from several potentialencoding processes a subset of encoding processes which can actually beimplemented to meet the user's preferences based on the encodingcharacteristics of the received videoconference transmission. Preferablythis subset of possible or available processes may be displayed to auser for subsequent selection of one or more process for use.

In yet a further preferred embodiment, the present invention may includethe facility to pre-calculate or pre-assess a number of encoding schemeswhich will potentially produce the best resulting encoded output basedon both the user's encoding preferences and encoding characteristicsobtained from a protocol signal or signals. In such instances, a subsetof available or possible encoding processes may still be presented ordisplayed to a user but the system or software provided may make arecommendation as to the best potential process for a user to select.

This facility can operate like a user interface “wizard” so that theuser will be presented with a facility to select and use only encodingschemes which are capable of satisfying the user requirements orparameters supplied based on the information extracted from a protocolsignal or signals associated with an incoming videoconferencetransmission.

For example, in one preferred embodiment, a user may input a requiredbit rate for the resulting encoded output in addition to the softwareplayer format required for the resulting output. Further information mayalso be provided by a user with respect to the number of monitors theywish to simulate from the videoconference call.

Information regarding the make-up or characteristics of an incomingaudio visual signal can then be obtained from one or more protocolsignal or signals. For example, in one instance, this informationobtained from a protocol signal may include any combination of thefollowing;

-   -   (i) audio codec employed    -   (ii) video codec employed    -   (iii) audio bit rate    -   (iv) video bit rate    -   (v) video frame rate    -   (vi) video resolution.

This information available for the software associated with or used bythe present invention can then make a selection or present a range ofoptions to a user indicating which audio and/or video codec to use, aswell as the particular video resolution and video frame rates availablefor use which will satisfy the input criteria originally supplied by theuser.

In a preferred embodiment information may be obtained from at least oneprotocol signal which indicates a content switch present within theaudio visual signal or signals received. Such a content switch mayindicate that audio visual signals are generated by a new or differentpiece of hardware, or that the configuration of a currently used cameraor microphone has been modified.

For example, in some instances a protocol signal may indicate that avideo freeze picture request signal has been received as part of avideoconference transmission. This freeze signal will hold the currentframe or picture making up the video content of the conference on thescreens of all participants and hence will indicate that a contentswitch has taken place. In this way a change from dynamic to staticcontent can be detected. The transmission of a freeze picture releasecontrol command or the removal of the freeze picture request signalwithin a protocol signal may also be detected as a content switch inconjunction with the present invention.

Furthermore, a content switch may also be detected through a protocolsignal indicating whether a document camera is currently being used toprovide a video feed into the conference. Such a document camera mayshow good quality close views of printed material as opposed to theparticipants of the conference. As such, the activation or use of adocument camera and the integration of a document camera signal, or theremoval of a document camera signal from a protocol signal can in turnindicate that the content of the video signals transmitted has switchedor changed.

In yet another instance a protocol signal may carry status informationindicating that a digital image or digital slide is to be used tocurrently form the video content of the conference. Such an imageincorporation or still image indicator signal within a protocol signalmay again be used to detect a content switch. A still image or ‘snapshot’ may be presented as the video content of the conference with thisimage sourced from a digital file, digital camera, video recorder, orany other compatible or appropriate type of data or information inputsystem. Furthermore, such contents flagged or indicated as a snapshot orstill image by protocol signals may also be sourced directly from adocument camera with the videoconferencing equipment if required. Inaddition, the removal of such still image information may also be usedto indicate a content switch.

Furthermore, content switches may also be detected through the automatedpanning or movement of a video camera lens from a number of pre-selectedviewing positions or angles. These viewing positions may be pre-set tofocus a camera on selected seating positions and their associatedspeakers, so that when the camera preset viewing angle changes, thecontent switch involved can be indicated by information present within aprotocol signal. Therefore, the integration of a camera movement signalinto a protocol signal can be used to detect a content switch.

In a further embodiment of the present invention a site name may beassociated with each end point of a vide conference where audio visualsignals transmitted from each site also have the site name embedded intoa protocol signal or signals associated with these audio visualtransmissions. A content switch may be detected through a change in nameassociated with an audio visual signal or signals where the nameassociated with each signal may furthermore be used to index, searchthrough or classify the content involved depending on the site at whicheach portion of content is generated.

According to another aspect of the present invention there is provided amethod encoding audio visual media signals characterised by the stepsof:

-   -   (i) receiving a videoconference transmission, from a computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals, and    -   (iii) detecting a content switch within the audio visual content        of a received audio visual signal, and    -   (iv) encoding an index marker at the time position at which the        content switch was detected.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals substantially asdescribed above characterised by the steps of:

-   -   (i) receiving a videoconference transmission, from a computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals, and    -   (iii) detecting a content switch within the audio visual content        of a received audio visual signal, and    -   (iv) encoding a keyframe and    -   (v) encoding an index marker at the same time position or        adjacent to the position of the keyframe encoded.

According to yet another aspect of the present invention there isprovided a method of encoding substantially as described above whereinindex markers are encoded within a time threshold from the time positionof keyframes.

In a preferred embodiment, the detection or indication of a contentswitch within an audio visual signal may trigger the association of atleast one index marker with the encoded output provided, where thisindex marker is associated with substantially the same time position inthe encoded output as the content switch was detected in the incomingaudio visual signal or signals.

In a further preferred embodiment index markers may be associated withthe same time position at which a content switch was detected in theoriginal incoming audio visual signal or signals involved. Those skilledin the art should appreciate however that some degree of variation inthe exact placement or positioning of the index marker involved willoccur due to the physical limitations of the software and equipmentemployed in conjunction with the present invention. However, inalternative embodiments the index marker involved may be associated withencoded output within a set time threshold period. In such instances, adegree of latitude may be allowed with respect to when an index markeris to be encoded, with the threshold distance or period involveddictating the degree of latitude allowed.

Furthermore, an index marker encoded may also include referenceinformation regarding how the particular content switch was detected andtherefore may give an indication as to what the content of the audiovisual signal is at the particular time position which the index markeris located at.

In a preferred embodiment an index marker may be associated with theencoded output provided through the actual encoding of a reference,pointer, URL or other similar marker actually within the encoded outputprovided. This marker or reference may then be detected by a playerapplication at approximately the same position as the content switch ofthe video content in place. However, in other embodiments an indexmarker may not necessarily be directly encoded into the output to beprovided. For example, in one embodiment a log file or separate recordof index markers may be recorded in addition to time position orlocation information associated with the video signal involved. Thisfile can indicate at which particular time positions an index marker isassociated with the video content involved.

In a further preferred embodiment, an index marker may be implementedthrough the insertion of a universal resource locater (URL) into theencoded output produced by the present invention. Those skilled in theart should appreciate that URL's are commonly used in the art to indexaudio visual media, and as such the present invention may employexisting technology to implement the index markers discussed above.

Preferably, these index markers encoded into the output provided may beused by the user of a player application to proactively seek or searchthrough the audio visual output of the present invention, depending onthe particular content which these index markers reference. An indexmarker may mark the time position or location in the encoded output atwhich selected types of content are present and subsequently allow auser to easily search the entire output produced for a selected portionor type of content.

In a further preferred embodiment, the presence of original keyframeswithin an incoming audio visual signal or signal's in proximity to thetime position at which an index marker is to be encoded can also bedetected in conjunction with the present invention.

If too many keyframes are located in proximity to one another this willdegrade the quality of resulting encoded output of the presentinvention, and also potentially affect frame rate and quality. However,it is preferable to have a keyframe close to an index marker in theencoded output as this will allow a software player application to seekto the time position of the index marker to quickly generate the videocontent required using a nearby keyframe.

Preferably, through detecting whether an original keyframe is near tothe time position at which an index marker is to be encoded, the presentinvention may optimise the placement of keyframes in the resultingencoded output. If no keyframe is present within a specified thresholdtime displacement tolerance, a new keyframe may be encoded atapproximately just before, after, or at the same time position as wherethe index marker is to be encoded. Conversely, if a keyframe isavailable within the threshold time period, no new keyframe may begenerated or incorporated into the resulting encoded output. In thismanner, a keyframe may be encoded into the encoded output at the sametime position or adjacent to the time position of the index markerinvolved.

According to a further aspect of the present invention there is provideda method of encoding audio visual media signals characterised by thesteps of:

-   -   (i) receiving a videoconference transmission from a computer        network, said videoconference transmission including at least        one audio visual signal and at least one protocol signal, and    -   (ii) reading one or more protocol signals, and    -   (iii) detecting the existence of a low content state present        within a received audio visual signal, and    -   (iv) time compressing the encoded output content during the time        period in which said low content state is detected within the        videoconference transmission received.

According to a further aspect of the present invention there is provideda method of encoding audio visual media substantially as described abovewherein a buffer is used to receive videoconference transmissionsignals, whereby the rate at which the contents of the buffer is playedout into an encoding process determines the degree of time compressionapplied to the original videoconference audio visual content whenencoded.

In a preferred embodiment, the present invention may also be used tomodify the timing or time position of particular portions of audiovisual content present within the encoded output when compared to theoriginal audio visual signal or signals provided. This timingmodification may be completed if a particular content switch is detectedthrough reading a protocol signal or signals.

In a further preferred embodiment, the encoded output may be timecompressed when a low content state is detected within a received audiovisual signal using at least one read protocol signal. Such low contentstates may persist for random periods of time and if encoded directlyinto the encoded output may make for a stilted or slow presentation ofcontent. The detection of a low content state (through preferably dataor flags in at least one protocol signal) can allow the audio visualcontent present within the encoded output to be speeded up if required.

In a further preferred embodiment the video and audio content receivedmay be time compressed if a fast picture update or a freeze or holdpicture control instruction is detected in a protocol signal. Normallythese instructions or signals are associated with the transmission oflarge amounts of image information between participants in thevideoconference, which can take some time to arrive and be assembled ata particular end point. This in turn can provide a relatively stiltedpresentation as the participant's interest in the current frozen imageor picture may have been exhausted before all of this information hasbeen received and subsequently displayed.

Through use of the present invention, this information system may bepre-cached and subsequently displayed for a short period of time only.The audio content of the conference may also be compressed over time tosynchronise the audio and visual content portions, provided that limitedaudio content is also generated over the time at which the still imageor frozen frame is displayed.

In a further preferred embodiment a buffer may be used to time compressthe audio visual content of the encoded output. In such embodiments, abuffer or buffer like component or data structure can be used toinitially receive audio visual signals so that the rate at which thecontents of the buffer is played out into an encoding process will inturn determine the degree of time compression applied to thevideoconference content when encoded. When time compression is to beover a selected period in which a low content state is detected, thecontents of the buffer may be played out to an encoder processed at afaster rate than normally employed.

Furthermore, preferably when a Freeze Picture Release command or signalis received in a protocol signal the contents of the buffer can beplayed out slower than normal until the buffer has made up the amount ofcontent that it played out faster previously.

The present invention may provide many potential advantages over theprior art. The present invention may read and subsequently employinformation from a protocol signal or signals to make intelligentdecisions regarding how an audio visual signal or stream should beencoded or re-formatted.

Information may be obtained from such protocol signals regarding theoriginal keyframe placement within the incoming audio visual signal,with this information in turn being employed to re-use the samekeyframes in output audio visual information provided. Furthermore, thistechnique may also be of assistance where particular content switcheswithin the received audio visual signal are detected and indexed in theencoded output provided. These index markers supplied can allow a userto proactively seek or search through the resulting encoded outputquickly for particular types of content. Furthermore, the keyframeplacement information obtained from a protocol signal can also be usedto ensure that a keyframe is placed in close time proximity to suchindex markers, thereby allowing the video information required to begenerated and displayed quickly to a user.

Information obtained from a protocol signal or signals may also be usedto assist in the selection of a particular encoding scheme or profilefor an incoming audio visual signal or signals. Based on userpreferences or selections and in conjunction with information relatingto the characteristics of an incoming audio visual signal obtained froma protocol signal, a user may be presented with a limited number ofcoding schemes which will produce the best results for the inputinformation that is supplied.

The present invention may also provide a facility to compress withrespect to presentation time selected types of content present with anincoming audio visual signal or signals. If a relatively stilted or slowcontent portion is detected within an incoming videoconference (such asa freeze picture segment) the time over which the content is present maybe compressed in the encoded output provided.

BRIEF DESCRIPTION OF DRAWINGS

Further aspects of the present invention will become apparent from thefollowing description which is given by way of example only and withreference to the accompanying drawings in which:

FIG. 1 shows a block schematic flowchart diagram of steps executed in amethod of encoding audio visual media signals in conjunction with apreferred embodiment, and

FIG. 2 illustrates in schematic form signals involved with the encodingprocess discussed with respect to FIG. 1, and

FIGS. 3 a, 3 b, 3 c show in schematic form signals with encodedkeyframes as discussed with respect to FIG. 2.

FIG. 4 shows a user interface and encoding scheme selection facilityprovided in accordance with another embodiment of the present invention.

FIGS. 5 a, 5 b, 5 c show a series of schematic diagrams of signals bothused and produced in accordance with a further embodiment of the presentinvention, and

FIGS. 6 a, 6 b & 6 c again show schematically a set of signals receivedand subsequently produced in accordance with yet another embodiment ofthe present invention, and

FIG. 7 & Table 1 show a process flowchart and related pseudo codedetailing steps taken in the insertion or encoding of a keyframe inconjunction with a preferred embodiment of the present invention, and

FIGS. 8 & 9, Tables 2 & 3 illustrate the encoding of keyframes and indexmarkers in accordance with a further embodiment of the presentinvention, and

FIG. 10 & Table 4 illustrate the provision of an adaptive contentplayout mechanism employing a buffer to accelerate the encoding ofcontent when low content states are detected.

BEST MODES FOR CARRYING OUT THE INVENTION

FIG. 1 shows a block schematic flowchart diagram of steps executed in amethod of encoding audio visual media signals in conjunction with apreferred embodiment.

In the first step of this method an encoding computer system connectedto a computer network receives a videoconference transmission from thecomputer network. This videoconference transmission includes audiovisual signals and a set of protocol signals. The protocol signalsprovide information regarding how the audio visual signals weregenerated, in addition to the status of the particular hardwareequipment used to generate signals.

In stage two of this method, information is extracted from the protocolsignals received in stage 1. In the embodiment discussed with respect toFIGS. 1 and 2, the information extracted from these protocol signalsconsists of an indication of the time position at which keyframes areencoded into the original audio visual signals received and alsoinformation regarding when a particular content switch occurs within theaudio visual information employed. In the embodiment considered acontent switch is detected through the use of a document camera asopposed to a camera which shows the participants of the conference.

At stage three of this method a specific encoding process is selectedfor application to the received audio visual signals based on theinformation present within the protocol signals read. In the instancediscussed, the encoding process selected incorporates specific indexmarker references into the output provided to indicate the contentswitch present within the audio visual information when a documentcamera is used. The encoding process selected also takes into accountthe position of each of the keyframes encoded into the original audiovisual signal and adjusts its generation or application of keyframeswithin the encoded output produced based on the time positions of theoriginal keyframes used.

In step four of this method the encoded output of the method isgenerated and produced for a particular software player application. Inthe instance discussed with respect to FIGS. 1 and 2, encoded outputprovided may be played on a Real Media Real Player.

FIG. 2 illustrates in schematic form elements of the encoding processdiscussed with respect to FIG. 1, showing an original audio visualsignal (5) and subsequent encoded output audio visual signal (6).

The original signal (5) includes a number of keyframes (7) distributedat specific time positions along the playing time of the signal (5). Theoriginal signal (5) also incorporates specific content switches betweena video showing content participants (8) and a still image or snap shot(9) taken from the video camera trained on the conference participants.

The re-encoded signal (6) takes advantage of information obtained fromprotocol signals received from an incoming videoconference transmissionto detect the presence of the keyframes (7) and content switches takingplace. Index markers (10) (formed in a preferred embodiment by URL's)are inserted into the encoded output signal (6) to indicate the presenceof a content switch in the audio visual content of the signal.

Where possible, the original keyframes (7) of the incoming audio visualsignal (5) are also recycled or reused as shown by the placement of thefirst keyframe (11 a) in the second signal (6). However, in the instanceshown, a new keyframe (11 b) is generated and encoded into the secondsignal (6) to provide a keyframe in close proximity to an index markerindicating the presence of a content switch in the audio visualinformation to be displayed. In this instance the second keyframe (7 b)of the original signal is not re-encoded or reused within the secondsignal (6).

FIGS. 3 a through 3 c show an incoming video stream (3 a), a videostream which is re-encoded without use of the present invention (3 b)and a video stream re-encoded using the present invention (3 c) whereinformation regarding the original keyframe placements of the originalvideo stream (3 a) is employed.

As can be seen from FIG. 3 b, a transcoded or re-encoded video signaldoes not necessarily have keyframes placed at the same positions orlocations as those provided in the signal shown with respect to FIG. 3 awithout use of the present invention. Conversely, in FIG. 3 c keyframesemployed are positioned at essentially the same time position as theoriginal keyframes within the original streamed video signal.

FIG. 4 shows a user interface and encoding scheme selection facilityprovided in accordance with another embodiment of the present invention.

In the instance shown an encoding computer system (12) is provided witha connection (13) to a computer network (14). This computer network (14)can carry videoconference transmissions to be supplied to the encodingcomputer (12) which acts as an encoding end point for thevideoconference. The encoding computer (12) transmits mute audio andblank video signals to be maintained as a participant to the conference,and is adapted to provide further encoded audio visual output sourcedfrom the audio visual signals employed within the videoconferencetransmission.

A user interface module (15) may be provided in communication with theencoding computer (12) for a separate user computer, or through softwarerunning on the same encoding computer (12). This user interface (UI)module can initially send user parameter information 16 to the encodingcomputer system. The encoding computer system (12) can also extractaudio visual signal parameter information from protocol signals receivedas part of the videoconference transmissions, where these parametersgive information regarding the audio visual signals making up part ofthe video transmission. These parameters can provide informationrelating to the make up of an incoming audio visual signal such as;

-   -   (i) the audio codec employed, and    -   (ii) the video codec employed, and    -   (iii) the bit rate of audio information supplied, and    -   (iv) the bit rate of video information supplied, and    -   (v) the video information frame rate, and    -   (vi) the video information resolution.

The encoding computer system may, using all of the user and protocolinformation obtained, calculate a number of “best fit” encoding schemeswhich can be used to meet the requirements of a user for an incomingvideo stream. Information regarding valid encoding schemes may then betransmitted (17) to the UI module, which in turn allows a user totransmit the scheme selection instruction (18) back to the encodingcomputer (12) to indicate which encoding scheme should be employed.

Based on these instructions, the encoding computer system may encode andproduce output (19) which can be played on a suitable computer basedmedia player application.

The process used to select or specify a set of encoding schemes whichmay be used is also shown in more detail through the pseudo code set outbelow.

H.323 call parameters:   H.263 video @ 112kbps   H.263 video resolution@ CIF   H.263 video frame rate @ 12.5fps   G.728 audio @ 16kbps Userinput:   Bitrate: 56kbps Modem   Player format: RealMedia Native -Single Stream   Display mode: Single Monitor Profiler decisions:   //find the media type for the stream   // either standard (video and audioonly) or presentation (audio, video and // snapshots)   If Display_Mode= Single_Monitor then       Profiler_Media_Type = (standard)   Else      Profiler_Media_Type = (presentation)   EndIf   // find the maximumaudio bitrate for the stream based on the media   type   // where mediatype is standard, allow more bitrate to the audio codec   than if   //media type of presentation selected (when presentation   need to leave  // bandwidth for the snapshot).   User_Bitrate = (56kbps) andProfiler_Media_Type = (standard)   therefore   Max_Audio_Bitrate =(8.5kbps).   // select the audio codec for use in the stream based onthe maximum   // available bandwidth.   If Incoming_Audio_Bitrate >Max_Audio_Bitrate then     Profiler_Audio_Codec = Select Audio_Codecfrom     Table_3 where     Bitrate_Supported <= Max_Audio_Bitratetherefore     Profiler_Audio_Codec = (RealAudio_8.5kbps_Voice)   Else    Profiler_Audio_Codec = Incoming_Audio_Codec   EndIf   // set thevideo bandwidth based on total available bandwidth and   bandwidth   //used by audio codec.   Profiler_Optimum_Bitrate = Select Optimum_Bitratefrom   Table_4 where   Bandwidth_Option = (56kbps_Modem)   If(Profiler_Audio_Codec <> Incoming_Audio_Codec) then    Profiler_Audio_Bitrate = Select Bitrate_Supported   from Table_3where     Audio_Codec = (Profiler_Audio_Codec)   Else    Profiler_Audio_Bitrate = Incoming_Audio_Bitrate   EndIf  Profiler_Video_Bitrate = Profiler_Optimum_Bitrate -  Profiler_Audio_Bitrate   therefore   Profiler_Video_Bitrate =(29.5kbps)   // set video resolution   Profiler_Video_Res = SelectOptimum_Resolution   from Table_4 where   Bandwidth_Option =(56kbps_Modem) therefore   Profiler_Video_Res = (176×144)   // set videocodec   If User_Player_Format = RealMedia_Native then Profiler_Video_(—)  Codec = (RealVideo9).   // set video frame rate  Max_Profiler_Frame_Rate = Incoming_Frame_Rate   Profiler_Frame_Rate =Select Optimum_Frame_Rate from   Table_4 where Bandwidth_Option =(56kbpsModem)   If Profiler_Frame_Rate > Max_Profiler_Frame_Rate then      Profiler_Frame_Rate = Max_Profiler_Frame_Rate   EndIf

FIGS. 5 a through 5 c show a series of schematic diagrams of signalsassociated with the present invention, and illustrate further behaviourof the invention depending on the input signals it receives.

FIG. 5 a shows an incoming protocol signal which indicates that a snapshot event occurs at frame 150 of the video signal shown with respect toFIG. 5 b. FIG. 5 b also shows that a keyframe has been encoded into theoriginal incoming video at frame 125.

FIG. 5 c shows the encoded video output provided in conjunction with thepresent invention in the embodiment shown. This figure illustrates howthe invention can be used to place a keyframe in its encoded outputsignal depending on the input the videoconference transmissionsreceived.

The software employed by the present invention makes a set of decisionsin the instance shown. The first of these decisions is completed throughconsidering a set value for the maximum time displacement betweenkeyframes which should be in the encoded output signal. In the instanceshown a keyframe is to be encoded every one hundred and fifty frames,and as a keyframe is provided at frame 124, this original keyframe issubsequently used in the encoded output (5 c).

Secondly, the software employed notes that an index marker is to beencoded or written to the output provided at frame 150 to mark theplacement of the snap shot event in the incoming video signal. Byconsidering a tolerance value for time displacement from this indexmarker, the software employed can see that the keyframe present at frame124 is within this tolerance and an additional keyframe does notnecessarily need to be encoded just before the snap shot event at frame150.

FIGS. 6 a, 6 b and 6 c show a set of signals illustrating furtherbehaviour of the present invention in yet another embodiment. In theembodiment shown an incoming protocol signal is shown with respect toFIG. 6 a, an incoming video signal is shown with respect to FIG. 6 b,whereas the encoded output video provided in conjunction with thepresent invention is shown as FIG. 6 c.

In this snapshot the incoming video includes a keyframe at frames 275and 402 with a video fast update picture protocol signal at frame 398.Conversely, the encoded output provided includes keyframes at frame 250and 402 respectively. In this instance shown a decision is made toencode the output to be provided so that keyframes are located a maximumof 150 frames apart. However, this maximum time between keyframes may bevaried depending on the particulars of the incoming signal, as discussedbelow.

When the original keyframe located at frame 275 in the incoming signalis detected, a decision is made by the software employed not to encode akeyframe in the output due to the proximity to the previous encodedkeyframe provided at frame 250. One hundred and fifty frames from frame250, a keyframe should be encoded based on the maximum time betweenkeyframes value. However in this case it is not encoded as the protocolsignal at frame 398 shows that a keyframe is expected in the followingframes. In this case the maximum time between keyframes is extendedslightly to allow for the keyframe associated with the video fastpicture update to be delivered. This keyframe arrives in the incomingvideo at frame 402 and the keyframe is then encoded in the output videoat frame 402.

FIG. 7 & Table 1 show a process flowchart and related pseudo codedetailing steps taken in the insertion or encoding of a keyframe inconjunction with a preferred embodiment of the present invention.

The process described initially receives a frame from decoding elementsor components of video conferencing equipment which forms an end pointto a video conferencing call.

The frame received is initially investigated to determine whether it isintra-coded, or forms a keyframe in the audio visual signals received inconjunction with the videoconference involved. This keyframe test isimplemented through checking the number of actual INTRA-codedmacroblocks within the frame where a maximum possible INTRA-codedmacroblock count will indicate the presence of a keyframe.

If the frame is not confirmed as a keyframe, the process then checks todetermine whether the video conferencing systems involved havetransmitted a fast picture update to the source of videoconferencetransmission, where such a fast picture update requests the transmissionof a keyframe.

If a keyframe is not expected, the received frame is tested to determineits quality or the proportion or percentage of macroblock elements itcontains when compared to a maximum macroblock level. In the embodimentdiscussed this threshold test is set at 85%. If the frame passes this85% threshold value, it is effectively treated as a keyframe and thepart of the process dealing with the treatment of keyframes is run.

If the received frame fails the macroblock or intra-coded thresholdtest, it is forwarded to a standard encoding system which produces thebulk of the encoded output required. This encoding system will encodethe frame required either in inter coded form or an intra coded formdepending on its internal parameters.

If the received frame is not confirmed as a keyframe yet a keyframe isexpected, a test is completed to determine whether the time since thelast keyframe is greater than or equal to the maximum time allowablebetween keyframes. If this test results in a true value, then themaximum time between keyframes allowed is increased and the frame issubsequently sent to the standard encoding system. Conversely if thetime between keyframes is lower than the maximum time involved, theframe is simply sent to the standard encoding system.

The maximum time between keyframes value is then employed to testwhether it should encode the current frame it receives as a keyframe oras an interceded frame.

If the system confirms that a keyframe is received or tests the qualityof received frame and determines that it is of a high enough quality tobe treated as a keyframe, the time since the last keyframe was receivedis retrieved. Next a test is completed to determine whether the currentkeyframe was received after a maximum time threshold value. If thismaximum time threshold has been exceeded, then the system or processprovided will force the encoding of the current frame as a keyframe inthe encoded output. If this time threshold has not been exceeded, thenthe current frame is supplied to the standard encoding system.

FIGS. 8, 9 and Tables 2 and 3 illustrate the encoding of keyframes andindex markers in accordance with a further embodiment of the presentinvention.

In the initial stage of the process shown with respect to FIG. 8, thesame steps are taken as discussed with respect to FIG. 7 for theencoding of keyframes. However, this process deviates at the pointnormally where keyframe or frames should be encoded.

In the process described, the encoding of a keyframe into the encodedoutput is delayed until the keyframe required is received from thevideoconference. This process also tests a low time threshold value todetermine whether the index marker received will be encoded within aspecific time period or time displacement from a keyframe. If there isno existing keyframe available within the time period required, then theexisting frame will be force encoded as a keyframe. Conversely, if akeyframe is available, the standard encoding process can be employed.

The additional index status procedure discussed with respect to FIG. 9and table 3 allows for monitoring or tracking of two concurrent orconsecutive index marker events, and also for encoding any index markersrequired. This allows one of these index markers to be discarded if itis clear that the operators or participants in the videoconferenceinvolved erroneously triggered the index marking event, and subsequentlyor immediately return the videoconference equipment to its prior stateor existing configuration.

FIG. 10 & Table 4 illustrate the provision of an adaptive contentplayout mechanism employing a buffer to accelerate the encoding ofcontent when low content states are detected.

In the implementation discussed, a freeze picture signal and protocolsignal is used to determine a low content state exists. The buffer datastructure is maintained and modified by the processes shown to speed upthe time based rate of encoding or to slow same dependent on whether thevideo freeze picture signal involved has been maintained or has beenreleased.

Aspects of the present invention have been described by way of exampleonly and it should be appreciated that modifications and additions maybe made thereto without departing from the scope thereof as defined inthe appended claims.

1. A method of encoding audio visual media signals that are part of avideoconference with a recording apparatus, comprising: configuring therecording apparatus as a participant in the videoconference, receiving,at the recording apparatus, a videoconference transmission from acomputer network, said videoconference transmission including at leastone audio visual signal and at least one protocol signal, reading, atthe recording apparatus, one or more protocol signals from the computernetwork pertaining to the videoconference transmission, applying, at therecording apparatus, a selected encoding process to a received audiovisual signal to generate an encoded videoconference, said encodingprocess being selected depending on the contents of said at least oneprotocol signal read, storing the generated encoded videoconference in amemory device associated with the recording apparatus, and outputting,at the recording apparatus, the encoded videoconference stored in thememory device to a reproduction device through the computer network. 2.The method of encoding as claimed in claim 1, further comprising:transmitting, by the recording apparatus, a mute audio signal and ablank video signal so the recording apparatus is maintained as aparticipant to the videoconference.
 3. The method of encoding as claimedin claim 2, wherein the reproduction device did not directly participatein the videoconference.
 4. The method of encoding as claimed in claim 2,further comprising: time compressing output audio visual content of theencoded output when a low content state is detected from a receivedprotocol signal.
 5. The method of encoding as claimed in claim 4,wherein the time compressing includes using a buffer to time compressthe audio visual content of the encoded output.
 6. The method ofencoding as claimed in claim 1, wherein the one or more protocol signalsincludes information regarding any combination of one or more offollowing parameters associated with an audio visual signal of avideoconference transmission: (i) audio codec employed, (ii) video codecemployed, (iii) the bit rate of audio information supplied, (iv) the bitrate of video information supplied, (v) the video information framerate, or (vi) the video information resolution.
 7. The method ofencoding as claimed in claim 1, wherein a content of the one or moreprotocol signals is used to detect a time position of at least onekeyframe present within an audio visual signal of the videoconferencetransmission.
 8. The method of encoding as claimed in claim 1, whereinthe contents of said at least one read protocol signal indicates acontent switch present within an audio visual signal of thevideoconference transmission.
 9. The method of encoding as claimed inclaim 8, further comprising: detecting a content switch from a freezepicture signal extracted from a protocol signal.
 10. The method ofencoding as claimed in claim 8, further comprising: detecting a contentswitch from a removal of a freeze picture request signal extracted froma protocol signal.
 11. The method of encoding as claimed in claim 8,further comprising: detecting a content switch from a document camerasignal extracted from a protocol signal.
 12. The method of encoding asclaimed in claim 8, further comprising: detecting a content switch fromremoval of a document camera signal extracted from a protocol signal.13. The method of encoding as claimed in claim 8, further comprising:detecting a content switch from an image incorporation signal extractedfrom a protocol signal.
 14. The method of encoding as claimed in claim8, further comprising: detecting a content switch from removal of animage incorporation signal extracted from a protocol signal.
 15. Themethod of encoding as claimed in claim 8, further comprising: detectinga content switch from a camera movement signal extracted from a protocolsignal.
 16. The method of encoding as claimed in claim 8, furthercomprising: detecting the content switch and triggering an associationof at least one index marker with the encoded output at a correspondingtime position in the encoded output at which the content switch wasdetected.
 17. The method of encoding as claimed in claim 16, wherein theat least one index marker includes reference information indicating whatcontent switch was detected.
 18. The method of encoding as claimed inclaim 16, wherein a protocol signal indicates a time position of atleast one keyframe present within an audio visual signal of thevideoconference transmission, and the method further comprising:including keyframes in the encoded output that are positioned adjacentto or in a same position as index markers encoded into said encodedoutput.
 19. The method of encoding as claimed in claim 18, whereinkeyframes included in the encoded output are positioned within athreshold time from an index marker.
 20. The method of encoding asclaimed in claim 18, wherein the keyframes are included in the encodedoutput at the same time position as index markers.
 21. A storage mediumcontaining therein a program which, when executed by a computer, causesthe computer to perform the method of claim
 1. 22. A method of encodingaudio visual media signals with an encoding apparatus, the methodcomprising: receiving, at the encoding apparatus, a videoconferencetransmission from a computer network, said videoconference transmissionincluding at least one audio visual signal and at least one protocolsignal, and reading, at the encoding apparatus, one or more protocolsignals, applying, at the encoding apparatus, a selected encodingprocess to a received audio visual signal, said encoding process beingselected depending on contents of said at least one protocol signalread, wherein the content of a read protocol signal is used to detectthe time position of at least one keyframe present within an audiovisual signal of the videoconference transmission; and encodingkeyframes into an encoded output at a same time position as keyframesare detected in an audio visual signal of the videoconferencetransmission.
 23. A method of encoding audio visual media signals withan encoding apparatus, the method comprising: receiving, at the encodingapparatus, a videoconference transmission from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal, reading, at the encoding apparatus,one or more protocol signals, determining, at the encoding apparatus, atime position of a first keyframe present within an audio visual signalreceived, and encoding, at the encoding apparatus, a second keyframeinto an encoded output at a same time position at which the firstkeyframe was detected in an originally received audio visual signal. 24.A storage medium containing therein a program which, when executed by acomputer, causes the computer to perform the method of claim
 23. 25. Amethod of encoding audio visual media signals with an encodingapparatus, the method comprising: receiving, at the encoding apparatus,a videoconference transmission from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal, reading, at the encoding apparatus,one or more protocol signals, detecting, at the encoding apparatus, acontent switch within the audio visual content of a received audiovisual signal or signals, and encoding, at the encoding apparatus, anindex marker at a time position at which the content switch wasdetected.
 26. The method of encoding as claimed in claim 25, whereinindex markers are encoded within a time threshold from a time positionof a keyframe.
 27. A storage medium containing therein a program which,when executed by a computer, causes the computer to perform the methodof claim
 25. 28. A method of encoding audio visual media signals with anencoding apparatus, the method comprising: receiving, at the encodingapparatus, a videoconference transmission, from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal, reading, at the encoding apparatus,one or more protocol signals, detecting, at the encoding apparatus, acontent switch within the audio visual content of a received audiovisual signal, encoding, at the encoding apparatus, a keyframe, andencoding, at the encoding apparatus, an index marker at a same timeposition or adjacent to the time position of the keyframe encoded.
 29. Astorage medium containing therein a program which, when executed by acomputer, causes the computer to perform the method of claim
 28. 30. Amethod of encoding audio visual media signals with an encodingapparatus, the method comprising: receiving, at the encoding apparatus,a videoconference transmission from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal, reading, at the encoding apparatus,one or more protocol signals, detecting, at the encoding apparatus, anexistence of a low content state present within a received audio visualsignal or signals, and time compressing, at the encoding apparatus, theencoded output content during a time period in which said low contentstate is detected within the videoconference transmission received. 31.The method of encoding as claimed in claim 30, further comprising: usinga buffer to receive videoconference transmission signals, wherein a rateat which contents of the buffer is read out to an encoding processdetermines a degree of time compression applied in the time compressing.32. A storage medium containing therein a program which, when executedby a computer, causes the computer to perform the method of claim 30.33. An apparatus for encoding audio visual media signals, said apparatuscomprising: a receiving unit configured to receive a videoconferencetransmission from a computer network, said videoconference transmissionincluding at least one audio visual signal and at least one protocolsignal; a processor configured to, read one or more protocol signalsfrom the computer network pertaining to the videoconferencetransmission, and apply a selected encoding process to a received audiovisual signal to generate an encoded videoconference, said encodingprocess being selected depending on the contents of said at least oneprotocol signal read; a memory device configured to store the generatedencoded videoconference; and an output unit configured to output theencoded videoconference stored in the memory device to a reproductiondevice through the computer network.
 34. An apparatus for encoding audiovisual media signals, said apparatus comprising: a receiving unitconfigured to receive a videoconference transmission from a computernetwork, said videoconference transmission including at least one audiovisual signal and at least one protocol signal; and a processorconfigured to, read one or more protocol signals, determine a timeposition of a first keyframe present within an audio visual signalreceived, and encode a second keyframe into an encoded output at a sametime position at which the first keyframe was detected in an originallyreceived audio visual signal.
 35. An apparatus for encoding audio visualmedia signals, said apparatus comprising: a receiving unit configured toreceive a videoconference transmission from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal; and a processor configured to, readone or more protocol signals, detect a content switch within the audiovisual content of a received audio visual signal or signals, and encodean index marker at a time position at which the content switch wasdetected.
 36. An apparatus for encoding audio visual media signals, saidapparatus comprising: a receiving unit configured to receive avideoconference transmission, from a computer network, saidvideoconference transmission including at least one audio visual signaland at least one protocol signal; and a processor configured to, readone or more protocol signals, and detect a content switch within theaudio visual content of a received audio visual signal, encode akeyframe, and encode an index marker at a same time position or adjacentto the time position of the keyframe encoded.
 37. An apparatus forencoding audio visual media signals, said apparatus comprising: areceiving unit configured to receive a videoconference transmission froma computer network, said videoconference transmission including at leastone audio visual signal and at least one protocol signal; and aprocessor configured to, read one or more protocol signals, detect anexistence of a low content state present within a received audio visualsignal or signals, and time compress the encoded output content during atime period in which said low content state is detected within thevideoconference transmission received.